Agent: automatic generation of experimental protocol runtime

Due to the nature of Virtual Reality (VR) research, conducting experiments in order to validate the researcher's hypotheses is a must. However, the development of such experiments is a tedious and time-consuming task. In this work, we propose to make this task easier, more intuitive and faster with a method able to describe and generate the most tedious components of VR experiments. The main objective is to let experiment designers focus on their core tasks: designing, conducting, and reporting experiments. To that end, we propose the use of Domain-Specific Languages (DSLs) to ease the description and generation of VR experiments. An analysis of published VR experiments is used to identify the main properties that characterize VR experiments. This allowed us to design AGENT (Automatic Generation of ExperimeNtal proTocol runtime), a DSL for specifying and generating experimental protocol runtimes. We demonstrated the feasibility of our approach by using AGENT on two experiments published in the VRST'16 proceedings.


INTRODUCTION
One of the focus of Virtual Reality (VR) researchers is the analysis of how humans behave [Slater 2009], perceive [Interrante et al. 2006] and interact [Bowman et al. 1999] in VR environments. This research is bounded to the design of user studies in order to validate the researcher's hypotheses. In particular, experiments in VR aim at studying the effects of a system, application, interface, algorithm, etc. on system or users. Indeed, according to Andrew Colman [Colman 2015], "experimental methods [...] [allow] rigorous examination of causal effects". For example, Latoschick et al. [Latoschik et al. 2016] designed a fake-mirror system that consists of a screen displaying the picture of an avatar imitating the movements of the user. They conducted an experiment to study the effect of the avatar nature (more or less realistic) on the feeling the users had they were standing before a real mirror.
Experiments must conform to some requirements, such as defining experimental conditions, dependent and independent variables, and the experimental protocol [Field and Hole 2002]. Research works were conducted to: guide experiment design in VR and augmented reality [Gabbard et al. 1999;Huang et al. 2012;Julier et al. 2001]; help evaluating specific concepts such as presence [Schubert et al. 2001;Slater et al. 1994], acceptability [Brooke et al. 1996], or collaboration [Hornbaek 2006;Meier et al. 2007]. Yet, experiment designers still have to develop the experimental VR application, which is a tedious, repetitive, and time-consuming task.
In this work, we propose AGENT (Automatic Generation of Expe-rimeNtal proTocol runtime) to ease the development of experimental VR applications. AGENT automatically generates experimental protocol runtimes letting experiment designers focus on their core tasks: designing, conducting, and reporting experiments. AGENT is based on the increasingly used Software Engineering concept of Domain-Specific Language (DSL) [Fowler 2010;Mernik et al. 2005;van Deursen et al. 2000]. DSLs ease software production by using software languages designed to tackle specific problems. DSLs are indeed designed to have key-words, notations, and syntaxes familiar to the domain experts. DSLs should remain small, simple, and easy to use. Their design should be adapted to the organizational process of the final users [Wile 2004]. By fulfilling these criteria, DSLs allow to leverage specific domain expertise of various stakeholders involved in the development of software systems [Gabbard et al. 1999;Huang et al. 2012;Julier et al. 2001].
After an experimental protocol designed within an AGENT model, this last is compiled into runnable code. The runnable code is then integrated into an already existing VR project, provided by the experiment designer.
The paper is structured as follows. Section 2 presents the state of the art of VR experiment design, the efforts made to ease the task of experiment designers, and their limits. In Section 3, an analysis of several existing and reported VR experiments is presented to characterize VR experiments. Based on this analysis, the proposed approach is detailed in Section 4. Section 5 reports and discusses the usage of our approach on two use cases. Finally, Section 6 concludes this work and presents future works.

STATE OF THE ART 2.1 Basics of Experimentation
Designing experiments is based on a rigorous set of principles to follow [Field and Hole 2002]. According to Andrew Colman [Colman 2015], an experiment is "a research method whose defining features are manipulation of an independent variable or variables and control of extraneous variables that might influence the dependent variable". An independent variable is "a variable that is varied by the experimenter independently of the extraneous variables". A dependent variable is "a variable that is potentially liable to be influenced by one or more independent variables". The main idea behind experiments is to assess what causality relations exist between different events or facts [Colman 2015], i.e., the potential effect independent variables have on dependent variables. Dependent variables correspond to the data collected during experiments and can be qualitative or quantitative data [Creswell 2013]. Asserting the existence of an apparent causal relation between independent and dependent variables, i.e., ensure internal validity, is a necessary step [Shadish et al. 2002]. Generalizing results observed on small populations, i.e., ensure external validity, requires the use of statistical methods [Field 2009;Pearl et al. 2014].

VR Experiment Design
Those general principles are applicable in every domain where experiments are used. In Computer Science, domains related to humancentered design need experiments to validate their approaches. Human-centered design indeed often implies human-computer interactions: the influence of the design properties on user-experience should be validated. Gabbbard proposed guidelines and methods for designing human-centered applications and experiments [Gabbard et al. 1999]. Other works focus on augmented reality applications design and evaluation [Huang et al. 2012;Julier et al. 2001].
If the classic method consists in running experiments where a population of participants uses the interfaces to evaluate, other approaches have been proposed. Stanney et al. use heuristics to guide and evaluate VR interfaces design [Stanney et al. 2003]. Tromp et al. propose a usability inspection method [Tromp et al. 2003], i.e., a simulation of the use of an application to find users need.
In human-centered design and more specifically in VR, some specific interface characteristics are evaluated. They are often qualitative aspects, which are not easy to evaluate because of their subjectivity [Field and Hole 2002]. Research have been done to propose standard questionnaires. Congnitive loading induced by systems can be evaluated thanks to the NASA-TLX questionnaire [Hart and Staveland 1988]. The Situation Present Assessment Method (SPAM) can be used to evaluate presence [Durso et al. 2004], so as other methods [Schubert et al. 2001;Slater et al. 1994;Witmer and Singer 1998]. Brooke proposed an acceptability questionnaire [Brooke et al. 1996]. Evaluation of collaboration was studied too: Hornbaek proposed metrics based on communication (e.g., number of uttered words per person, number of questions asked to collaborators, number of interruptions) [Hornbaek 2006]. Meier et al. completed these metrics by considering other aspects of collaboration (e.g., coordination or motivation of each actor) [Meier et al. 2007].

Easing Experiment Design
If these works focus on guiding experiment designers, implementing experimental protocols to be integrated into running applications is a manual, time-consuming and tedious task. Tools dedicated to facilitate this task exist. Field et al. proposed IBM SPSS Statistics [Field 2009], a tool for performing statistical analyses of data. Another example of such tools is the R project 1 . Software solutions for designing experimental conditions (e.g., variables, populations) and registering results can be used, e.g., EDA 2 or Go-Lab 3 , but they are mostly useful for other domains than VR (mostly biology, physics, or chemistry). VR could benefit of some tools referenced by the National Heritage Language Resource Center (NHLRC, California) 4 that are more human-centered. More interestingly, the framework EVE was specifically designed to ease experiment setup and implementation in Virtual Environments [Grübel et al. 2016]. EVE focuses on automating data gathering and analysis but does not handle experimental protocol generation. No other solutions dedicated to VR exist.
In overall, the existing solutions are limited to experiment conditions modeling and data management. There is no code generation and the model are in general not meant to be processed by programs. Furthermore, protocol definition is often limited or nonexistent. The consequence we observed ourselves is that researchers end-up by developing their own limited and ad hoc solutions. There are no libraries or projects that generate running code for VR experiments.

VR EVALUATIONS: A PRELIMINARY ANALYSIS
In this section we study recent research papers to draw up a precise map of the concepts related to VR experiment design. The goal is to identify the properties that characterize experiments in VR.
We have studied 15 papers and posters from the VRST'16 proceedings [ Kra 2016] where experiments are conducted on various topics: displays, latency, training, presence, human behavior simulation, haptics, tracking, cinematic VR, distance perception, and 3D user interfaces. We focused on experiment design: mathematical and statistical analysis of data is not in the scope of this paper (but in future works). Obviously, the set of identified propertiesobtained empirically -can not cover all VR experiments. Though, we consider that recent VRST papers are representative enough. Hence, the drawn properties cover a large scope of VR experiments.
The first observation we made is that designing an experiment can be divided into two main tasks: (1) experimental conditions and variable design, and (2) protocol design. We reported our analysis on two mind-maps, respectively for the concept of variable and the concept of protocol (see Figures 1 and 2). This allowed us to quickly obtain a structure of concepts specific to VR experiments. Figures 1 and 2 summarizes the main concepts we identified. Section 3.1 discusses experimental conditions and variable design. Section 3.2 discusses protocol design.

Variables
Two types of variables exist: dependent and independent variables. Dependent variables can be quantitative: physiological constants, e.g., blood pressure, time for performing a task, performance of the subject, accuracy of a method. They can also be subjective feelings, e.g., how the subject appreciated a task, how comfortable it was. Questionnaires like the ones presented in Section 2 can be used to record these qualitative measures. Hence, data can come from various sources: mainly physiological sensors, software measurements, and forms. As a result, two first properties can be drawn up to characterize VR experiments: P1: dependent variables can be of different types (integer, float, boolean, mark, customized types, etc.) P2: dependent variables correspond to three types of data sources: sensors, software measurements, and forms.
Independent variables correspond to what is under evaluation or comparison: metaphors, navigation, interaction or rendering techniques, hardware setups, algorithms, environmental conditions, etc. Hence, independent variables correspond to software or hardware features developed or studied by the experiment designers. In experiments, two usages are possible for independent variables: • comparison, e.g., several metaphors are compared to determine which one is the most appreciated, • determining the effect of a single condition, e.g., one specific environmental condition is activated and then deactivated to determine the effect induced by its presence.
Independent variables can then be of two types: boolean (compare presence / absence) and enumeration (comparison of several conditions). The possible values of independent variables are called "levels". Two more properties can then be added: P3: two types are possible for independent variables: boolean and enumeration.
P4: independent variables and their levels correspond to software or hardware features that can vary.
All possible levels are not necessarily presented to each participant of an experiment. In some cases, several groups of participants -exposed to different conditions -are necessary. For example, to evaluate the effect of a navigation technique on cybersickness, two groups could be made: one group of subjects exposed to the navigation technique under evaluation and one control group, exposed to a known navigation technique that does not induce abnormal cybersickness. Conditions that are evaluated on all participants of a study are called "within-subject factors", whereas conditions that differ from one group to another are called "between-subject factors". One more property can be drawn up: P5: between-subject factors imply to make several groups, each of them corresponding to a distinct protocol.

Protocol
The protocol is the process that participants follow during a session. We consider that it begins when the participants start to use the VR experimental application, i.e., we consider the protocol starts after preliminary phases during which the participants have to read and sign a consent form and are associated to identifiers, for anonymity reasons. If there are between-subject factors and hence several groups, each group is associated to a different protocol, with the only difference being at the level of the between-subject factors. This is the statement of property P6: P6: the protocols of each group differ only at the level of between-subject factors.
The part of the protocol on which we focus is often separated into two phases: (1) an acclimatization and / or calibration phase and (2) a data acquisition phase. The acclimatization phase role is to train the participants to the use of the VR application: they should be used to the different conditions they will experiment. During the acclimatization phase, participants repeat several times the different conditions, very often in a randomized order. During this phase, it is possible to adapt the number of repetitions to the participants. A calibration phase with similar characteristics can be performed, e.g., if some sensors must be calibrated.
The data acquisition phase role is to record data, and hence to evaluate the influence of the independent variables on the dependent variables. Participants repeat also several times the different conditions in a randomized order. The number of repetitions is generally greater than in acclimatization phase and must be the  same for each participant. A last property can be added: P7: in acclimatization and calibration phases, the subject perform tasks but no data is collected.

APPROACH
The properties that characterized VR experiments are now used to propose an approach for easing their design and production.

Overview
The proposed approach allows to design VR experiments with the use of a DSL we designed based on the properties identified in Section 3. This DSL is called AGENT (Automatic Generation of ExperimeNtal proTocol runtime). Models produced with this DSL are compiled into code to be integrated into VR projects (e.g., Unity 3D projects). Figure 3 depicts the processing chain of the approach.
The use of AGENT produces an AGENT model that is then compiled into runnable code integrable into a VR project. An AGENT model is composed of two parts: an experimental conditions model and a protocol model. It is up to the experiment designer to provide to AGENT all the VR components that are not directly related to the experiment (i.e., 3D models, metaphors, interactions, etc.).
The remaining of Section 4 is organized as follows. Section 4.2 introduces an illustrative example as a basis for the detailed explanation of the approach. Section 4.3 presents the experimental conditions model structure. Section 4.4 presents the protocol model structure. Section 4.5 ends the presentation of the approach by presenting the compilation and integration steps.

Illustrative Example
Consider a VR experiment of which independent variables are: • I b , a boolean variable (between-subject factor), • I e with two levels : L 1 and L 2 (within-subject factor).
The dependent variables are: • D q , answers to a questionnaire, in the form of Likert-scale marks, for evaluating the task regarding independent variables possible values, • D s , speed of execution of the task.
The protocol is as follows: (1) acclimatization phase : the subject executes the task 2 × 4 times, i.e., 4 times each condition among the condition set C b or C ¬b , depending on the group (see Equations (1) and (2)), no data being recorded, (2) data acquisition : the subject executes the task 2 × 32 times, i.e., 32 times each condition among C b or C ¬b , with D s being recorded, (3) subjective evaluation : the subject answers the questionnaire. (1) (2) The two groups, respectively exposed to the conditions C b and C ¬b are called G b and G ¬b .

Experimental Conditions Model
The experimental conditions model is the part of an AGENT model that describes the independent and dependent variables. Figure 4 depicts the concepts of this DSL in the form of a UML class diagram. Figure 5 shows the experimental conditions model designed using AGENT that corresponds to the example of Section 4.2.
An experimental conditions model has a tree-like structure. The tree is composed of three mandatory nodes: (1) the root that defines the name of the model, and its two child nodes; (2) the "independent   variables" node; (3) the "dependent variables" node. The independent and dependent variables are respectively to be defined under the nodes (2) and (3), as their child nodes.
According to P3, there are two types of independent variables: boolean and enumeration variables. Enumeration variables are constituted of possible levels, each level being defined under the node corresponding to their variable (e.g., see Figure 5).
There are two types of dependent variables: objective and subjective variables. Objective variables are data collected from software or hardware sources (physiological sensors and software measurements of P2) and can be of various types (e.g., boolean, integer, enumeration, float, customized types, etc.), according to P1. The linkage of objective dependent variables to their data source is managed by the integration module (see Section 4.5), not by the experimental conditions model, where only the names of the variables can be provided (e.g., D s ). Subjective variables are questions asked to participants and their answers, gathered into special forms designed by the experimenters (see P2). In the experimental conditions model, the experiment designer can define forms with an identifier (e.g., D q ) and a hyperlink allowing to access the form. In the remaining of the paper, all the leaves of an experimental conditions model (i.e., boolean independent variables, variable levels, questionnaires, and objective dependent variables) will be called features.

Protocol Model
4.4.1 Description. Figure 6 depicts the concepts that are presents in protocol models in the form of a class UML diagram. It shows that AGENT allows to define experimental protocols as lists. Figure 7 shows such a list, that contains three elements separated by arrows. Some elements are composed, e.g., the second element in Figure 7 is composed of Condition1, Condition2, and Acclim. Figure 7 shows the protocol model designed using AGENT that corresponds to the group G b of the example presented in Section 4.2. The protocol of group G ¬b is the same, but feature I b is not present (set to false). Note that between-subject and within-subject factors are implicitly defined in AGENT. The distinction is made if several groups (hence several protocol models) exist: variables corresponding to between-subject factors are the ones that vary from a protocol model to another. Properties P5 and P6 are then satisfied.
A protocol model is a states list with five types of state: (1) start state (green disk on Figure 7), end state (red disk), simple state (yellow rectangle), random-loop state (purple rectangle), and customized-loop state (orange rectangle, see Figure 11). The core idea behind protocol models is that each phase of the experiment (simple, random-loop, and customized-loop states) corresponds to the selection of a subset of the features defined by an experimental conditions model. Let's consider the protocol given the example of Section 4.2. First, the subject goes through an acclimatization phase where he has to perform the task 8 times, covering 2 conditions that are combinations of the independent variables possible values. Second, he does the same thing but with 64 repetitions, and with data being recorded, i.e., considering the effect of the independent variables on the dependent variable D s . Third, he completes  the questionnaire corresponding to the set of dependent variables identified by D q . Hence, each step corresponds to the selection of several features from the experimental conditions model of Figure 5. For each step of a protocol model, the selected features are listed under the "Features" label (see Figure 7). For example, in the simple state SubjectiveEval, there is only one feature selected: D q .
The case of loop states (random-loop and customized-loop) is more complex. A loop state models the repetition of a task under different conditions. The number of repetitions for each condition, i.e., the multiplicity (see Figure 6), is indicated at the top-right corner of each loop state (see Figure 7). Conditions are modeled as blue rectangles linked to the loop state that cover them. Conditions make references to features, so as loop states do. If a feature is held by the loop state itself, then it means that this feature is active for all repetitions and conditions. For example, in the DataAcq state corresponding to the step (2) of the illustrative protocol, data is recorded whatever the current condition. The between-subject factor I b is also set to true for the group G b in all cases. Hence, the feature D s is held by the random-loop state (as so as I b for G b ). If a feature is held by a condition, then it means that the feature is active only for repetitions where the condition is active.
Random-loop states allow to model repetitions where conditions come in a random order. The multiplicity is an integer (see Figure 6) that represents the number of times each condition will be repeated. Customized-loop states allow to model other kinds of repetitions, e.g., deterministic or based on the participants choice. Sometimes experiment designers indeed propose phases where participants can repeat conditions on demand [Medeiros et al. 2016]. To handle this case, the multiplicity of customized-loop states is not an integer but an interval (see Figure 6). This way, the experiment designer can make the number of loops be: constant (notation "n"), limited (notation "n..N "), or unlimited (notation "n.. * ").
Note that differentiating acclimatization and data acquisition phases is illustrated here: acclimatization phases are loop states where data recording is deactivated (no dependent variable in the feature list of the loop state). Data acquisition phases are on the contrary loop states (in general random-loop states) where data recording is activated. In our approach, acclimatization and calibration phases are represented the same way: extraneous calibrations of any kinds are not managed by AGENT. P7 is then satisfied.

Discussion.
The choice of modeling protocols as lists comes from the preliminary study (see Section 3). Lists are sufficient because of the nature of VR experiments: protocols do not allow alternatives. However, when there are between-subject variables, the protocol varies from one group to another. In our approach, the experiment designer simply has to make one protocol model per group. The only variations between the different protocols are at the level of the features corresponding to between-subject variables. That is why the concepts of between-subject and within-subject variables do not appear explicitly in AGENT.
Other variations may appear at the task level. The subject could indeed have the choice to execute some actions in the order he wants. For example, consider a task consisting in selecting multiple objects in the Virtual Environment, the subject could chose in which order he selects the objects. However, these variations are at the level of the use-case and do not correspond to variations of the protocol in itself. Variations in the use-case are out of the scope of this paper and that is why AGENT does not manage them. It is up to the experiment designer to manage these variations and provide them to AGENT along with the VR elements not related to the experiment in itself (see Figure 3).

Code Generation and Integration
After an experiment protocol was modeled, runnable code can be generated through the use of the AGENT compiler. The generated runnable code can be integrated into the VR project through the AGENT integration module. The runnable code is characterized in Section 4.5.1. The transformation operations that allow to compile AGENT models to runnable code are precised in Section 4.5.2. The integration module is presented in Section 4.5.3.  Figure 3 shows, the generated runnable code can be characterized by a runtime model. The runtime model is a state machine. For example, the runtime model generated from the example of Section 4.2 (group G b ) is represented in Figure 8.

Generated Runnable Code Characterization. As
On the figure, The rounded rectangle is a state containing itself a state machine. Conditions triggering a transition are indicated between brackets. Actions resulting from the triggering of a transition follow the slash. The variable t .nb is the number of executed trials. The variable f is the set of selected features. The [r ] condition means that the transition is triggered randomly. Final states are marked by a cross. 4.5.2 Compilation: from AGENT Model to Runnable Code. Figure 8 already gives the intuition of the transformation algorithm (from protocol model to runtime model). The formal algorithm will not be given for the sake of simplicity and concision. We will only give the intuition of it.
Transformation of Start, End, and Simple States. The transformation operator applied to the start, end, and simple states is the identity: they are respectively transformed to initial, final, and standard states of the runtime model.

Transformation of Transitions. Two possibilities exist:
(1) the origin state of the transition is a start, end, or simple state.
In that case, the transformation operator is the identity. (2) the origin state of the transition is a loop state. In that case, the generated transition is conditional: the condition for going through the generated transition is that the number of loops defined in the loop state where ran. In all cases, the generated transitions must trigger events: the deselection of the features held by the origin state and the selection of the features held by the target state.
Transformation of Loop States. Loop states are transformed to sub-state-machines. Each condition is transformed to a state. The sub-state-machine must loop on each of these states a number of time conform to the multiplicity of the loop state. Obviously features held by the conditions must be selected or deselected adequately.

Code Integration.
The integration module is composed of 3 libraries that the experiment designer has to use in order to integrate properly the generated runnable code to the VR project. This section presents conceptually these libraries. Concrete usage of them is presented in Section 5.
Feature Binding Library. The experiment designer must bind the experimental conditions model features to runtime features (e.g., metaphors, interactions, virtual objects, etc.), that he provides (see Figure 3). For example, the feature L 1 represented in Figure 5 could be bound to a virtual object (runtime feature). The selection (resp. deselection) of L 1 in the runtime model would make the virtual object appear (resp.disappear). Conceptually, the feature binding library is a map that links experimental conditions model features to runtime features, with two functions to implement for specifying what happens at each selection / deselection of a feature. Objective dependent variables are associated to their data source (e.g., a field in a class, the output of a measurement tool, etc.), that can be of various types. Property P1, P2, and P4 are then satisfied.
Condition Sequencing Library. This library provides several algorithms for sequencing the conditions in the case of loop states: random with constant seed, random with time-based seed, deterministic with fixed number of loops, controlled by the user, etc.. Trial Completion Management Library. Trial completion is managed by the runtime model at the experiment level, i.e., loop conditions are changed after each trial completion. However, trial completion must be detected and the resulting effects on the Virtual Environment must be triggered. Consider for example a task consisting in going from a departure point D to an arrival point A. Trial completion should be detected when the arrival point A is reached and the triggered effect would be to make the avatar automatically return to the departure point D, for starting a new trial with other conditions. The experiment designer can use the trial completion management library to add conditions and effects corresponding to trial completion to the transitions of the runtime model.

Properties Fulfilling
Our approach satisfies the properties listed in Section 3.
• P1: the type of an experimental conditions model feature is the type of its bound runtime feature, which is freely determined by the experiment designer (see Section 4.5.3). • P2: Questionnaire dependent variables are bound to forms thanks to an hyperlink (see Figures 4 and 5). ValueSource dependent variables are bound to data sources which nature is determined by the experiment designer.

Implementation
AGENT was developed using DSL Tools 5 , a Visual Studio 2015 6 extension for creating DSLs. The AGENT compiler has been developed in C#. Once an AGENT model designed by an experiment designer, code can be automatically generated from this model by the compiler. The generated code consists of XML files that implement the state machine structure of the runtime model. It comes along with pre-coded C# classes and an interpreter, provided with AGENT, that implements the dynamic aspect of the runtime model. The generated classes and the interpreter are meant to be used with Unity 3D (we discuss adaptation of AGENT for other VR platforms in Section 5.3). The integration module is a Unity library, presented in more details in Section 5.1.

CREATING AN EXPERIMENT USING AGENT
In this section, we explain step by step the usage of AGENT to produce an experiment. Section 5.2 presents the use of AGENT on two real use-cases reported in the ACM VRST 2016proceedings [Kra 2016. We end this section with a discussion (Section 5.3).

Usage
The usage of AGENT is composed of three main steps: modeling, code generation, and code integration.
5.1.1 Modeling. In our implementation, the experimental conditions model and the protocol model are made using AGENT, developed as Visual Studio extensions. To produce one of the models, the experiment designer has first to create a new Visual Studio project with the appropriate model type. He can then create the model by drag-and-dropping the different components (e.g., experimental conditions model features, protocol model states, transitions, etc.) on a conception area. The text fields and features lists can be edited directly on the created components. Figures 5, 7, 10 and 11 show examples (screen captures) of models built using the AGENT DSL within Visual Studio. Once the models are conceived they are saved into XML files that are the input data for the code generation step.

Code
Generation. The compiler takes as entry the XML files and generates an output XML file that represents the structure of the state machine. The output XML makes reference to C# classes we pre-coded and that are provided with AGENT. These classes are responsible of transitions implementation.

Code Integration.
All the integration is done in Unity at the level of one provided prefab with three child nodes: one for each library defined in Section 4.5.3. They are each composed of Unity scripts that have to be inherited to produce new scripts to attach to the nodes of the provided prefab.
Feature Binding Library. Figure 9 shows an example of feature binding in Unity, based on the example of Section 4.2. This library contains several scripts, each of them allowing to bind a feature to a special kind of Unity resource (e.g., GameObject, script, field, etc.). Each of these script define at least two fields: one for specifying the feature name, and another for specifying the Unity resource it is bound to. They also define two methods to be implemented by the experiment designer, for managing feature selection and deselection.
To bind features to external data sources (e.g., physiological sensors), the experiment designer can bind the desired features to a Unity script which reads the external data.
Condition Sequencing Library. This library is composed of two scripts, corresponding to the two types of loop states. They allow to associate a loop state to sequencing algorithms we provide. The customized-loop script allows in particular to detect user requests to either end the loop or switch conditions. Trial Completion Management Library. This library is composed of one script that defines two functions to implement, for specifying the trial completion condition and the trial completion effect. A field allows to reference the protocol model state it is bound to.

Use-cases
We used our approach on two experiments. The first use-case is an experiment we reported [Le Moulec et al. 2016]. The AGENT model was made, the code was generated, and then integrated to the Unity project of the associated VR application. (obviously, the former code parts that were responsible of the protocol runtime were removed). The refactored experiment is fully functional.
The second use-case was the experiment reported by Mossel et al. [Mossel and Koessler 2016]. In this experiment, Mossel et al. compare two segmentation and selection techniques (Raycast and CutPlane). They evaluate their effects on user efficiency in function of the difficulty of the selection task. The AGENT model was made and the code was generated. Section 5.2.1 presents the models we made from this work (Figures 10 and 11). Section 5.2.2 presents the code integration process for producing the complete experiment. Figure 10 shows the experimental conditions model made with AGENT, from the work of Mossel et al.. In the presentation of their experiment, Mossel et al. give clearly the independent variables "The participants had to use both Raycast and   Figure 10. Figure 11 shows the protocol model. The phases are: "training phase, [...] experiment, and [...] a post-questionnaire". The three phases are represented with the three states of the protocol model in Figure 11: Training, Experiment, PostQuestionnaire. PostQuestionnaire is a simple state because the participant only has to answer to the questions. The Experiment state is a random-loop of multiplicity 1 with six conditions combining the values of the two independent variables (see Figure 11). The justification can be found in two sentences from Mossel et al.: "The participants had to use both Raycast and CutPlane in combination with all three scenarios. That results in total in six different tasks which the participants perform in random order". Training corresponds to an acclimatization phase. Mossel et al. give this description: "[Participants] were freely interacting in a test environment, which comprised a simple Unity3D scene with some artificial virtual objects that could be segmented and selected and where objects' visibility ranged from visible to fully occluded. As soon as the user reported to feel confident, the experiment stage [...] started". Hence, Training is a customized-loop state with unlimited multiplicity (0.. * ). All visibility conditions are presents in the Virtual Environment (visible, partOccl, fulOccl). The user can switch between the two selection techniques, which explains the two conditions TrainingC1 and TrainingC2 in Figure 11.

Code Integration.
With the feature binding library, the experiment designers can bind the independent variables to the Unity components that manage them. In the state Training, the three visibility conditions are selected at the same time. It just means that they coexist at the same time in the test environment. The objective independent variables (TaskDuration, WalkDistance, Seg-mentationMiss, SelectionMiss) are to be bound to numeric calculated fields (i.e., fields in Unity scripts) that are updated automatically by the VR application and that correspond to the task duration, the walk distance, the number of inappropriate segmentation attempts, and the number of inappropriate selection attempts, for each trial.
With the condition sequencing library, the experiments designers can chose the implementation they want for the random-loop state. They can also make the Training loop state end by detecting the user request, performed on the UI. Switching between the conditions TrainingC1 and TrainingC2 is also made by detecting a user request.
With the trial completion management library, the experiment designers can precise the trial end condition: the user performed the segmentation and selection task. It can be for example detected through a Unity script. The action to perform to begin the new trial is to reset the position of the user to the start position.

Discussion
AGENT is made to work with Unity. However it is possible to adapt it to other VR platforms (e.g., CRYENGINE, Unreal, etc.). The modeling part of AGENT does not need to be modified. We estimate that the compiler needs to be partially re-implemented. The integration module needs to be totally re-implemented. Nevertheless we estimate that the induced effort is minimal. These re-developed components are indeed reusable and they remain small.
The compiler needs only to translate the protocol model to a state machine. Estimating the effort for implementing The integration module is more difficult because it highly depends on the target platform. However, if we base our estimation on our implementation, the only task is to develop a dozen of classes, each one containing no more than ten lines of fields and functions definitions.
Furthermore, producing an experiment with our implementation can be done in few hours. To integrate the code, some functions must be implemented (see Section 5.1.3). We estimate that five lines of code for each of them in average is a maximum.

CONCLUSION AND FUTURE WORKS
In this paper, we presented an approach for the automatic generation of experimental protocol runtime. We conducted a study on fifteen experiments reported in the VRST'16 proceedings [Kra 2016], to determine the properties our approach should satisfy. Seven properties could be drawn up. These properties define the main concepts our approach has to take into consideration: independent and dependent variables; between and within subject factors; acclimatization, calibration, and data acquisition phases of the protocol. They also highlight the diversity of the independent and dependent variables in experiments: variables can be software or hardware features, measurements or subjective analysis, data can be produced by very specific devices, e.g., for measuring physiological constants.
We designed an approach conform to these properties. We then introduced the AGENT DSL, that generates the experimental protocol runtimes. More particularly, AGENT allows to write experimental conditions models and protocol models. These models are then compiled into runnable code, for integration in a VR project.
We demonstrated that our approach is conform the seven properties deduced from the preliminary analysis. The experimental conditions model indeed allows to define independent and dependent variables, and the protocol model allows to define acclimatization, standardization, and data acquisition phases. Between and within subject factors are implicitly defined by the use of several protocol models. Finally, the integration module allows to bind independent and dependent variables to any hardware or software resource, hence allowing to handle their potential diversity.
We demonstrated the feasibility of our approach by using AGENT on two reported experiments. One of them was completely rebuilt and for the other one we generated the code. The 15 experiments reported in the VRST'16 proceedings [ Kra 2016] can be generated with the usage of AGENT.
In future works, we have the intention to submit AGENT to a user-study to evaluate its efficiency over manual implementation. We also plan to evaluate its replicability with engines different from Unity. Moreover, we plan to extend this work to the statistical analysis step, by performing it automatically after the experiment was conducted. Finally, we plan to investigate further the automatic production of VR applications. In other domains than experiments, the production of VR applications is indeed done manually, without code or concept reuse. In particular, we plan to propose an approach for automatically generating VR applications for training.